Explained Variance Score (explained_variance_score)#
explained_variance_score is a regression metric that asks a very specific question:
How much of the variability (spread) in \(y\) remains unexplained after using predictions \(\hat{y}\)?
It is similar to \(R^2\), but with an important twist: it uses the variance of the residuals and is therefore insensitive to constant (additive) bias.
Learning goals#
derive and interpret the metric (including edge cases)
understand its relationship to MSE and \(R^2\)
implement it from scratch in NumPy (sample weights + multi-output)
visualize how it reacts to noise vs bias
see what happens if you try to optimize it directly (gradient descent)
Notation#
True targets: \(y \in \mathbb{R}^n\)
Predictions: \(\hat{y} \in \mathbb{R}^n\)
Residuals: \(r = y - \hat{y}\)
Mean: \(\bar{z} = \frac{1}{n}\sum_{i=1}^n z_i\)
Variance (population / ddof=0): \(\mathrm{Var}(z) = \frac{1}{n}\sum_{i=1}^n (z_i - \bar{z})^2\)
import numpy as np
import plotly.express as px
import plotly.graph_objects as go
import os
import plotly.io as pio
from sklearn.linear_model import LinearRegression
from sklearn.metrics import explained_variance_score, mean_squared_error, r2_score
from sklearn.model_selection import train_test_split
pio.templates.default = 'plotly_white'
pio.renderers.default = os.environ.get("PLOTLY_RENDERER", "notebook")
np.set_printoptions(precision=4, suppress=True)
rng = np.random.default_rng(42)
1) Definition and intuition#
For a single regression target, explained variance is defined as
Numerator: variance of the residuals \(r = y - \hat{y}\) (after removing the residual mean).
Denominator: variance of the target \(y\).
Interpretation:
1.0: residuals have zero variance (they are all equal) → the model captures the shape of \(y\) perfectly.0.0: residual variance equals target variance → predictions do not reduce the spread.< 0: residual variance is larger than target variance → worse than a naive baseline.
Key property: shift invariance (bias blindness)#
For any constant \(c\),
because adding a constant shifts residuals by \(-c\) but does not change their variance.
Edge case: constant targets and force_finite#
If \(\mathrm{Var}(y)=0\) (all targets identical), the fraction above is ill-defined.
scikit-learn uses force_finite=True by default and returns:
1.0if \(\mathrm{Var}(y-\hat{y})=0\) (e.g., predictions differ from \(y\) by a constant offset)0.0otherwise
2) Relationship to MSE and \(R^2\)#
Let residuals be \(r = y - \hat{y}\). The mean squared error decomposes as:
This yields:
So \(R^2 \le \mathrm{EVS}\), with equality when the mean residual is zero (no additive bias).
Practical takeaway: explained variance measures how well you match the fluctuations of \(y\); it can look great even when you are systematically too high/low.
# Synthetic signal + noise
n = 400
x = np.linspace(0, 2 * np.pi, n)
y_signal = np.sin(x)
y_true = y_signal + rng.normal(0.0, 0.25, size=n)
# A few prediction candidates
preds = {
'signal (denoised)': y_signal,
'signal + constant bias': y_signal + 1.0,
'mean baseline': np.full_like(y_true, y_true.mean()),
'signal + extra noise': y_signal + rng.normal(0.0, 0.9, size=n),
'wrong phase': np.sin(x + 0.7),
}
def summarize(y_true, y_pred):
residuals = y_true - y_pred
return {
'EVS': explained_variance_score(y_true, y_pred),
'R2': r2_score(y_true, y_pred),
'MSE': mean_squared_error(y_true, y_pred),
'residual_mean': float(residuals.mean()),
'residual_var': float(residuals.var()),
}
summaries = {name: summarize(y_true, y_pred) for name, y_pred in preds.items()}
summaries
{'signal (denoised)': {'EVS': 0.8944126033656888,
'R2': 0.8944094638062047,
'MSE': 0.05654500849911091,
'residual_mean': -0.0012966387524886791,
'residual_var': 0.05654332722705647},
'signal + constant bias': {'EVS': 0.8944126033656888,
'R2': -0.9778046284015942,
'MSE': 1.0591382860040883,
'residual_mean': -1.0012966387524886,
'residual_var': 0.05654332722705647},
'mean baseline': {'EVS': 0.0,
'R2': 0.0,
'MSE': 0.53551208789518,
'residual_mean': -3.552713678800501e-17,
'residual_var': 0.53551208789518},
'signal + extra noise': {'EVS': -0.6971716305768838,
'R2': -0.7011039492575686,
'MSE': 0.9109617275936569,
'residual_mean': 0.04588904212295869,
'residual_var': 0.9088559234066942},
'wrong phase': {'EVS': 0.473680853439166,
'R2': 0.47366507095168253,
'MSE': 0.2818587167868259,
'residual_mean': -0.0029071829705829798,
'residual_var': 0.28185026507400146}}
fig = go.Figure()
fig.add_trace(
go.Scatter(
x=x,
y=y_true,
mode='markers',
name='y_true',
marker=dict(size=4, opacity=0.5),
)
)
for name, y_pred in preds.items():
evs = summaries[name]['EVS']
r2 = summaries[name]['R2']
fig.add_trace(
go.Scatter(
x=x,
y=y_pred,
mode='lines',
name=f'{name} (EVS={evs:.3f}, R2={r2:.3f})',
)
)
fig.update_layout(
title='Same residual variance ⇒ same EVS (bias does not matter)',
xaxis_title='x',
yaxis_title='y',
legend_title='Predictor',
)
fig
fig = go.Figure()
bins = dict(start=-3, end=3, size=0.1)
for name, y_pred in preds.items():
residuals = y_true - y_pred
fig.add_trace(
go.Histogram(
x=residuals,
name=f"{name} (mean={residuals.mean():.2f}, std={residuals.std():.2f})",
opacity=0.55,
xbins=bins,
)
)
fig.update_layout(
title='Residual distributions (EVS uses variance, not the mean)',
xaxis_title='residual (y_true - y_pred)',
yaxis_title='count',
barmode='overlay',
)
fig
biases = np.linspace(-2.5, 2.5, 101)
evs_vals = []
r2_vals = []
base = preds['signal (denoised)']
for c in biases:
y_pred = base + c
evs_vals.append(explained_variance_score(y_true, y_pred))
r2_vals.append(r2_score(y_true, y_pred))
fig = go.Figure()
fig.add_trace(go.Scatter(x=biases, y=evs_vals, mode='lines', name='EVS'))
fig.add_trace(go.Scatter(x=biases, y=r2_vals, mode='lines', name='R2'))
fig.update_layout(
title='Adding a constant offset changes R2 but not explained variance',
xaxis_title='constant offset c added to predictions',
yaxis_title='score',
)
fig
sigmas = np.linspace(0.0, 2.0, 31)
var_y = float(np.var(y_true))
evs_theory = 1.0 - (sigmas**2) / var_y
n_trials = 300
rng_sim = np.random.default_rng(123)
evs_sim = []
for sigma in sigmas:
scores = []
for _ in range(n_trials):
y_pred = y_true + rng_sim.normal(0.0, sigma, size=n)
scores.append(explained_variance_score(y_true, y_pred))
evs_sim.append(float(np.mean(scores)))
fig = go.Figure()
fig.add_trace(go.Scatter(x=sigmas, y=evs_theory, mode='lines', name='theory: 1 - σ²/Var(y)'))
fig.add_trace(
go.Scatter(
x=sigmas,
y=evs_sim,
mode='markers+lines',
name=f'simulation mean ({n_trials} trials)',
)
)
fig.update_layout(
title='Independent noise reduces EVS proportionally to its variance',
xaxis_title='noise σ (y_pred = y_true + ε, ε ~ N(0, σ²))',
yaxis_title='explained variance score',
)
fig
3) NumPy implementation (from scratch)#
scikit-learn supports:
optional
sample_weight(per-sample weights)multi-output regression (shape
(n_samples, n_outputs))output aggregation via
multioutput:raw_values: one score per outputuniform_average: mean across outputsvariance_weighted: mean weighted by each output’s target variance
Below is a compact NumPy implementation that matches sklearn.metrics.explained_variance_score (including force_finite).
def _to_2d(a):
a = np.asarray(a)
if a.ndim == 1:
return a.reshape(-1, 1)
if a.ndim == 2:
return a
raise ValueError(f'Expected 1D or 2D array, got shape {a.shape}')
def _check_sample_weight(sample_weight, n_samples):
if sample_weight is None:
return None
w = np.asarray(sample_weight, dtype=float)
if w.ndim != 1:
raise ValueError(f'sample_weight must be 1D, got shape {w.shape}')
if w.shape[0] != n_samples:
raise ValueError(f'sample_weight length {w.shape[0]} != n_samples {n_samples}')
return w
def _weighted_mean(a, sample_weight):
return np.average(a, weights=sample_weight, axis=0)
def _weighted_variance(a, sample_weight):
mean = _weighted_mean(a, sample_weight)
return np.average((a - mean) ** 2, weights=sample_weight, axis=0)
def explained_variance_score_np(
y_true,
y_pred,
*,
sample_weight=None,
multioutput='uniform_average',
force_finite=True,
):
y_true = _to_2d(y_true)
y_pred = _to_2d(y_pred)
if y_true.shape != y_pred.shape:
raise ValueError(f'y_true shape {y_true.shape} != y_pred shape {y_pred.shape}')
n_samples, n_outputs = y_true.shape
w = _check_sample_weight(sample_weight, n_samples)
residual = y_true - y_pred
numerator = _weighted_variance(residual, w)
denominator = _weighted_variance(y_true, w)
with np.errstate(divide='ignore', invalid='ignore'):
output_scores = 1.0 - (numerator / denominator)
if force_finite:
zero_den = denominator == 0
output_scores = output_scores.copy()
output_scores[zero_den & (numerator == 0)] = 1.0
output_scores[zero_den & (numerator != 0)] = 0.0
if multioutput == 'raw_values':
return output_scores if n_outputs > 1 else float(output_scores[0])
if multioutput == 'uniform_average':
return float(np.mean(output_scores))
if multioutput == 'variance_weighted':
if np.all(denominator == 0):
return float(np.mean(output_scores))
return float(np.average(output_scores, weights=denominator))
# array-like weights
mo = np.asarray(multioutput, dtype=float)
if mo.shape != (n_outputs,):
raise ValueError(f'multioutput weights must have shape ({n_outputs},), got {mo.shape}')
return float(np.average(output_scores, weights=mo))
def mean_squared_error_np(y_true, y_pred, *, sample_weight=None, multioutput='uniform_average'):
y_true = _to_2d(y_true)
y_pred = _to_2d(y_pred)
if y_true.shape != y_pred.shape:
raise ValueError(f'y_true shape {y_true.shape} != y_pred shape {y_pred.shape}')
n_samples, n_outputs = y_true.shape
w = _check_sample_weight(sample_weight, n_samples)
raw = np.average((y_true - y_pred) ** 2, weights=w, axis=0)
if multioutput == 'raw_values':
return raw if n_outputs > 1 else float(raw[0])
if multioutput == 'uniform_average':
return float(np.mean(raw))
mo = np.asarray(multioutput, dtype=float)
if mo.shape != (n_outputs,):
raise ValueError(f'multioutput weights must have shape ({n_outputs},), got {mo.shape}')
return float(np.average(raw, weights=mo))
def r2_score_np(
y_true,
y_pred,
*,
sample_weight=None,
multioutput='uniform_average',
force_finite=True,
):
y_true = _to_2d(y_true)
y_pred = _to_2d(y_pred)
if y_true.shape != y_pred.shape:
raise ValueError(f'y_true shape {y_true.shape} != y_pred shape {y_pred.shape}')
n_samples, n_outputs = y_true.shape
w = _check_sample_weight(sample_weight, n_samples)
numerator = np.average((y_true - y_pred) ** 2, weights=w, axis=0)
denominator = _weighted_variance(y_true, w)
with np.errstate(divide='ignore', invalid='ignore'):
output_scores = 1.0 - (numerator / denominator)
if force_finite:
zero_den = denominator == 0
output_scores = output_scores.copy()
output_scores[zero_den & (numerator == 0)] = 1.0
output_scores[zero_den & (numerator != 0)] = 0.0
if multioutput == 'raw_values':
return output_scores if n_outputs > 1 else float(output_scores[0])
if multioutput == 'uniform_average':
return float(np.mean(output_scores))
if multioutput == 'variance_weighted':
if np.all(denominator == 0):
return float(np.mean(output_scores))
return float(np.average(output_scores, weights=denominator))
mo = np.asarray(multioutput, dtype=float)
if mo.shape != (n_outputs,):
raise ValueError(f'multioutput weights must have shape ({n_outputs},), got {mo.shape}')
return float(np.average(output_scores, weights=mo))
# Quick parity checks vs scikit-learn (random data)
for n_outputs in [1, 3]:
y_true = rng.normal(size=(200, n_outputs))
y_pred = y_true + rng.normal(scale=0.8, size=(200, n_outputs))
w = rng.uniform(0.1, 2.0, size=200)
for mo in ['raw_values', 'uniform_average', 'variance_weighted']:
sk = explained_variance_score(y_true, y_pred, sample_weight=w, multioutput=mo)
ours = explained_variance_score_np(y_true, y_pred, sample_weight=w, multioutput=mo)
np.testing.assert_allclose(ours, sk, rtol=1e-12, atol=1e-12)
# Degenerate denominator behavior (constant y)
y_const = np.ones(10)
y_const_pred_const = np.zeros_like(y_const)
y_const_pred_vary = np.arange(y_const.size)
np.testing.assert_allclose(
explained_variance_score_np(y_const, y_const_pred_const),
explained_variance_score(y_const, y_const_pred_const),
)
np.testing.assert_allclose(
explained_variance_score_np(y_const, y_const_pred_vary),
explained_variance_score(y_const, y_const_pred_vary),
)
# `force_finite=False` matches NaN/-inf conventions
np.testing.assert_allclose(
explained_variance_score_np(y_const, y_const_pred_const, force_finite=False),
explained_variance_score(y_const, y_const_pred_const, force_finite=False),
equal_nan=True,
)
np.testing.assert_allclose(
explained_variance_score_np(y_const, y_const_pred_vary, force_finite=False),
explained_variance_score(y_const, y_const_pred_vary, force_finite=False),
equal_nan=True,
)
'All checks passed.'
/home/tempa/miniconda3/lib/python3.12/site-packages/sklearn/metrics/_regression.py:930: RuntimeWarning:
invalid value encountered in divide
/home/tempa/miniconda3/lib/python3.12/site-packages/sklearn/metrics/_regression.py:930: RuntimeWarning:
divide by zero encountered in divide
'All checks passed.'
4) Multi-output aggregation (multioutput)#
For multi-output regression (shape (n_samples, n_outputs)), EVS is computed per output first.
Then multioutput decides how to combine them:
raw_values: return one score per outputuniform_average: mean of the per-output scoresvariance_weighted: weighted mean using each output’s target variance (outputs with more variance count more)
This matters when different targets live on very different scales.
n = 300
y1 = 10 * rng.normal(size=n) # high-variance target
y2 = 0.2 * rng.normal(size=n) # low-variance target
y_true_mo = np.column_stack([y1, y2])
# Predictions: do well on y2, poorly on y1
y_pred_mo = np.column_stack(
[
y1 + rng.normal(0, 15, size=n),
y2 + rng.normal(0, 0.05, size=n),
]
)
raw = explained_variance_score(y_true_mo, y_pred_mo, multioutput='raw_values')
uniform = explained_variance_score(y_true_mo, y_pred_mo, multioutput='uniform_average')
var_w = explained_variance_score(y_true_mo, y_pred_mo, multioutput='variance_weighted')
raw, uniform, var_w
(array([-1.1705, 0.931 ]), -0.11973865514890675, -1.169799844269795)
fig = go.Figure()
fig.add_trace(
go.Bar(
x=['output 1 (high var)', 'output 2 (low var)'],
y=raw,
name='raw EVS',
)
)
fig.update_layout(
title=f'Aggregation: uniform={uniform:.3f}, variance_weighted={var_w:.3f}',
yaxis_title='explained variance score',
)
fig
5) Sample weights (sample_weight)#
With per-sample weights \(w_i \ge 0\), sklearn uses weighted means and variances:
So samples with larger weight influence both the numerator and denominator more. A common use case is heteroscedastic noise (some samples are much noisier / less reliable).
n = 240
x = rng.normal(size=n)
y_clean = 2.0 * x
# First third is much noisier
mask_noisy = np.arange(n) < (n // 3)
noise = rng.normal(0.0, 0.2, size=n)
noise[mask_noisy] = rng.normal(0.0, 1.5, size=mask_noisy.sum())
y_true_w = y_clean + noise
y_pred_w = y_clean # model predicts the underlying signal
weights = np.ones(n)
weights[mask_noisy] = 0.2 # downweight noisy samples
score_unweighted = explained_variance_score(y_true_w, y_pred_w)
score_weighted = explained_variance_score(y_true_w, y_pred_w, sample_weight=weights)
score_unweighted, score_weighted
(0.8436035954505202, 0.9504664689590617)
order = np.argsort(x)
fig = go.Figure()
fig.add_trace(
go.Scatter(
x=x,
y=y_true_w,
mode='markers',
name='y_true',
marker=dict(
size=6,
color=weights,
colorscale='Viridis',
showscale=True,
colorbar=dict(title='weight'),
opacity=0.85,
),
)
)
fig.add_trace(
go.Scatter(
x=x[order],
y=y_pred_w[order],
mode='lines',
name='y_pred (signal)',
line=dict(color='black'),
)
)
fig.update_layout(
title=(
'Downweighting noisy samples increases EVS: '
f'unweighted={score_unweighted:.3f}, weighted={score_weighted:.3f}'
),
xaxis_title='x',
yaxis_title='y',
)
fig
6) Optimizing EVS directly (linear regression)#
explained_variance_score is primarily an evaluation metric. But it’s instructive to see what happens if you try to optimize it.
Maximizing EVS is equivalent to minimizing the residual variance:
For a linear model \(\hat{y} = Xw + b\), define centered residuals \(r_c = r - \bar{r}\) and the loss \(L = \frac{1}{n}\sum_{i=1}^n r_{c,i}^2\).
The gradients are
So the intercept is not identifiable under this objective: any constant shift in \(\hat{y}\) leaves EVS unchanged. We’ll see that in gradient descent: \(b\) barely moves, EVS can look good, but \(R^2\) / MSE reveal the bias.
def fit_linear_gd(
X,
y,
*,
loss,
lr=0.05,
n_steps=300,
w0=None,
b0=0.0,
):
X = np.asarray(X, dtype=float)
y = np.asarray(y, dtype=float).reshape(-1)
n_samples, n_features = X.shape
w = np.zeros(n_features) if w0 is None else np.asarray(w0, dtype=float).copy()
b = float(b0)
history = {'step': [], 'mse': [], 'r2': [], 'evs': [], 'b': []}
for step in range(n_steps):
y_pred = X @ w + b
r = y - y_pred
if loss == 'mse':
grad_w = -(2.0 / n_samples) * (X.T @ r)
grad_b = -(2.0 / n_samples) * float(np.sum(r))
elif loss == 'resid_var':
r_c = r - r.mean()
grad_w = -(2.0 / n_samples) * (X.T @ r_c)
grad_b = -(2.0 / n_samples) * float(np.sum(r_c)) # always ~0
else:
raise ValueError("loss must be 'mse' or 'resid_var'")
w -= lr * grad_w
b -= lr * grad_b
y_pred = X @ w + b
history['step'].append(step)
history['mse'].append(mean_squared_error_np(y, y_pred))
history['r2'].append(r2_score_np(y, y_pred))
history['evs'].append(explained_variance_score_np(y, y_pred))
history['b'].append(b)
return w, b, history
# Synthetic linear regression problem
n = 500
X = rng.normal(size=(n, 1))
w_true = np.array([2.5])
b_true = 1.7
y = b_true + X @ w_true + rng.normal(0.0, 0.5, size=n)
y = y.reshape(-1)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
w_mse, b_mse, hist_mse = fit_linear_gd(X_train, y_train, loss='mse', lr=0.05, n_steps=300)
w_evs, b_evs, hist_evs = fit_linear_gd(X_train, y_train, loss='resid_var', lr=0.05, n_steps=300)
def evaluate(X_, y_, w, b):
y_pred = X_ @ w + b
return {
'MSE': mean_squared_error_np(y_, y_pred),
'R2': r2_score_np(y_, y_pred),
'EVS': explained_variance_score_np(y_, y_pred),
'residual_mean': float(np.mean(y_ - y_pred)),
}
b_evs_post = float(y_train.mean() - (X_train.mean(axis=0) @ w_evs))
results = {
'train (MSE loss)': evaluate(X_train, y_train, w_mse, b_mse),
'test (MSE loss)': evaluate(X_test, y_test, w_mse, b_mse),
'train (EVS loss)': evaluate(X_train, y_train, w_evs, b_evs),
'test (EVS loss)': evaluate(X_test, y_test, w_evs, b_evs),
'test (EVS loss + post-hoc intercept)': evaluate(X_test, y_test, w_evs, b_evs_post),
}
{
'true': {'w': w_true, 'b': b_true},
'trained_on_mse': {'w': w_mse, 'b': b_mse},
'trained_on_evs': {'w': w_evs, 'b': b_evs},
'evs_posthoc_intercept': b_evs_post,
'metrics': results,
}
{'true': {'w': array([2.5]), 'b': 1.7},
'trained_on_mse': {'w': array([2.4944]), 'b': 1.6951145770136242},
'trained_on_evs': {'w': array([2.4944]), 'b': -7.917476198469688e-17},
'evs_posthoc_intercept': 1.6951145770146998,
'metrics': {'train (MSE loss)': {'MSE': 0.22740982130503978,
'R2': 0.9646517961590391,
'EVS': 0.9646517961590391,
'residual_mean': 9.732659123073972e-13},
'test (MSE loss)': {'MSE': 0.2563190118563668,
'R2': 0.9652494187042723,
'EVS': 0.9662444560416645,
'residual_mean': 0.0856700581900787},
'train (EVS loss)': {'MSE': 3.1008232505127644,
'R2': 0.5180131987928035,
'EVS': 0.9646517961590391,
'residual_mean': 1.6951145770147},
'test (EVS loss)': {'MSE': 3.4201735699636053,
'R2': 0.5363082167501443,
'EVS': 0.9662444560416775,
'residual_mean': 1.780784635203714},
'test (EVS loss + post-hoc intercept)': {'MSE': 0.2563190118560888,
'R2': 0.9652494187043099,
'EVS': 0.9662444560416775,
'residual_mean': 0.08567005818901435}}}
fig = go.Figure()
fig.add_trace(
go.Scatter(
x=hist_mse['step'],
y=hist_mse['evs'],
mode='lines',
name='EVS (trained on MSE)',
)
)
fig.add_trace(
go.Scatter(
x=hist_evs['step'],
y=hist_evs['evs'],
mode='lines',
name='EVS (trained on Var(resid))',
)
)
fig.add_trace(
go.Scatter(
x=hist_mse['step'],
y=hist_mse['r2'],
mode='lines',
name='R2 (trained on MSE)',
line=dict(dash='dash'),
)
)
fig.add_trace(
go.Scatter(
x=hist_evs['step'],
y=hist_evs['r2'],
mode='lines',
name='R2 (trained on Var(resid))',
line=dict(dash='dash'),
)
)
fig.update_layout(
title='Training curves: EVS can look good even with a biased intercept',
xaxis_title='gradient descent step',
yaxis_title='score',
)
fig
fig = go.Figure()
fig.add_trace(go.Scatter(x=hist_mse['step'], y=hist_mse['b'], mode='lines', name='b (MSE loss)'))
fig.add_trace(go.Scatter(x=hist_evs['step'], y=hist_evs['b'], mode='lines', name='b (EVS loss)'))
fig.update_layout(
title='Intercept during training (EVS-loss has ~zero gradient)',
xaxis_title='step',
yaxis_title='b',
)
fig
y_pred_test_mse = X_test @ w_mse + b_mse
y_pred_test_evs = X_test @ w_evs + b_evs
y_pred_test_evs_post = X_test @ w_evs + b_evs_post
low = float(
np.min(
[y_test.min(), y_pred_test_mse.min(), y_pred_test_evs.min(), y_pred_test_evs_post.min()]
)
)
high = float(
np.max(
[y_test.max(), y_pred_test_mse.max(), y_pred_test_evs.max(), y_pred_test_evs_post.max()]
)
)
fig = go.Figure()
fig.add_trace(go.Scatter(x=y_test, y=y_pred_test_mse, mode='markers', name='trained on MSE', opacity=0.7))
fig.add_trace(
go.Scatter(x=y_test, y=y_pred_test_evs, mode='markers', name='trained on Var(resid)', opacity=0.7)
)
fig.add_trace(
go.Scatter(
x=y_test,
y=y_pred_test_evs_post,
mode='markers',
name='Var(resid) + post-hoc intercept',
opacity=0.7,
)
)
fig.add_trace(
go.Scatter(
x=[low, high],
y=[low, high],
mode='lines',
name='ideal',
line=dict(color='black', dash='dot'),
)
)
fig.update_layout(
title='Predicted vs true (test): EVS-loss can be systematically shifted',
xaxis_title='y_true',
yaxis_title='y_pred',
)
fig
7) Practical usage (scikit-learn)#
In practice you almost never train by maximizing EVS; you train with a proper loss (e.g., MSE/MAE) and evaluate with EVS. Here’s the standard sklearn pattern:
lin = LinearRegression().fit(X_train, y_train)
y_pred = lin.predict(X_test)
{
'coef': lin.coef_,
'intercept': lin.intercept_,
'EVS': explained_variance_score(y_test, y_pred),
'R2': r2_score(y_test, y_pred),
'MSE': mean_squared_error(y_test, y_pred),
}
{'coef': array([2.4944]),
'intercept': 1.6951145770147051,
'EVS': 0.9662444560416782,
'R2': 0.9652494187043107,
'MSE': 0.2563190118560831}
8) Pros, cons, and when to use it#
Pros#
Unitless and easy to compare across scales (like \(R^2\)).
Directly measures how much residual spread remains relative to the target spread.
Supports sample weighting and multi-output aggregation.
Cons / pitfalls#
Ignores additive bias: a constant offset does not change the score (can be very misleading).
Not a great training objective: intercept is not identifiable (zero gradient under the EVS-equivalent loss).
For nearly-constant targets, the metric can be unstable/ill-defined;
force_finite=Truecan hide that.
Good use cases#
You care about matching fluctuations and will correct offsets separately (post-hoc calibration / de-meaning).
Targets are naturally centered/detrended, so additive bias is not meaningful.
Multi-output settings where you want to quantify variance capture per output and then aggregate.
9) Diagnostics: what to check alongside EVS#
Because EVS can be blind to bias, pair it with:
Residual mean (bias): \(\bar{r}\)
MAE / MSE / RMSE (accuracy)
\(R^2\) (captures both variance + bias effects)
plots: \(y\) vs \(\hat{y}\), residual histogram, residuals vs features/time
10) Exercises#
Construct two predictors with the same EVS but very different \(R^2\). Explain using the MSE decomposition.
Implement
multioutputweighting by hand for 3 outputs and verify against sklearn.In the gradient descent section, initialize \(b\) to a large value and show that the EVS-loss optimizer does not fix it.
(Time series) Detrend a signal, fit a model, and compare EVS before/after detrending.
References#
scikit-learn API: https://scikit-learn.org/stable/modules/generated/sklearn.metrics.explained_variance_score.html
scikit-learn regression metrics overview: https://scikit-learn.org/stable/modules/model_evaluation.html#regression-metrics